{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://raw.githubusercontent.com/Qiskit/qiskit-tutorials/master/images/qiskit-heading.png\" alt=\"Note: In order for images to show up in this jupyter notebook you need to select File => Trusted Notebook\" width=\"500 px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## _*Comparing Strings with Quantum Superpositon*_ \n",
    "\n",
    "The latest version of this notebook is available on https://github.com/QISKit/qiskit-tutorial.\n",
    "\n",
    "For more information about how to use the IBM Q Experience (QX), consult the [tutorials](https://quantumexperience.ng.bluemix.net/qstage/#/tutorial?sectionId=c59b3710b928891a1420190148a72cce&pageIndex=0), or check out the [community](https://quantumexperience.ng.bluemix.net/qstage/#/community).\n",
    "\n",
    "\n",
    "***\n",
    "### Contributors\n",
    "Rudy Raymond"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Motivation\n",
    "\n",
    "If we can use quantum states to represent genetic codes, we may be able to compare them, and/or find similar genetic codes quickly. \n",
    "\n",
    "For example, according to [this site](http://www.bioinformatics.org/sms2/genetic_code.html) the starts of the genetic codes for the Yeast Mitochondrial, Protozoan Mitochondrial, and Bacterial Code are respectively as follow. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:52:52.597525Z",
     "start_time": "2018-09-25T16:52:52.593598Z"
    }
   },
   "outputs": [],
   "source": [
    "YEAST     = \"----------------------------------MM----------------------------\"\n",
    "PROTOZOAN = \"--MM---------------M------------MMMM---------------M------------\"\n",
    "BACTERIAL = \"---M---------------M------------MMMM---------------M------------\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that each of the codes is represented by a bitstring of length 64. By comparing characters at the same position in the strings, we can see that Protozoan's is closer to Bacterial's than Yeast's. \n",
    "\n",
    "Exploiting quantum superposition, we can create quantum states by using only 7 qubits such that each of the quantum states corresponds to the genetic code of Yeast, Protozoan, and Bacterial. We then compare the closeness of their genetic codes by comparing their quantum states, which is made possible by the reversibility of quantum circuit.\n",
    "\n",
    "The reversibility of quantum circuit to test the similarity of quantum states works as follow. Assume that we can create a quantum superposition starting from all-zero states by a quantum circuit. Then by inverting the same quantum circuit and we give it the same quantum superposition as input, we will get exactly all-zero bits as the output. Now, when we give a similar quantum superposition as input to the inverted circuit, we can still get all-zero bits as the output with probability proportional to the similarity of the quantum states: the more similar, the more we observe all-zero bits. \n",
    "\n",
    "Thus, to decide which code (*Yeast's* *or* *Bacterial's*) is the most similar to the Protozoan, we can do the following:\n",
    "\n",
    "1. We first prepare the quantum state that encodes the Protozoan's\n",
    "2. We then use the quantum state as inputs to the inverted circuits that each prepare the quantum state of Yeast's and Bacterial's. Run and measure the circuits\n",
    "3. Output the name of the inverted circuit whose measurements result in more frequent measurements of all-zero bits. \n",
    "\n",
    "\n",
    "## Quantum Superposition for Bitstrings\n",
    "\n",
    "A qubit can be in a superposition of two basis states: \"0\" and \"1\" at the same time. Going further, two qubits can be in a superposition of four basis states: \"00\", \"01\", \"10\", and \"11\". In general, $n$ qubits can be in a superposition of $2^n$ (exponential in the number of qubits!) basis states. \n",
    "\n",
    "Here, we show a simple example to create quantum superpositon for bitstrings and use them to compare the similarity between two bitstrings. This tutorial makes use the [quantum state initialization function](https://nbviewer.jupyter.org/github/QISKit/qiskit-tutorial/blob/master/reference/tools/quantum_gates_and_linear_algebra.ipynb#Arbitrary-initialization) and **circuit inversion**. It also illustrates the power of loading data into quantum states. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparing bitstrings of length 64 with 7 qubits\n",
    "\n",
    "Let say we have three genetic codes as above.\n",
    "\n",
    "```\n",
    "YEAST     = \"----------------------------------MM----------------------------\"\n",
    "PROTOZOAN = \"--MM---------------M------------MMMM---------------M------------\"\n",
    "BACTERIAL = \"---M---------------M------------MMMM---------------M------------\"\n",
    "```\n",
    "\n",
    "Let use 7 qubits to encode the above codes: the first 6 qubits for indexing the location in the code (because we have 64 positions that we number from 0 to 63), and the last qubit for the content of the code (we use \"0\" for \"-\" and \"1\" for \"M\"). Thus, numbering the position of the code from left to right, we can create quantum states for each of the code as below: \n",
    "\n",
    "\\begin{eqnarray}\n",
    "|YEAST \\rangle &=& \\frac{1}{8} \\left( |000000\\rangle |0\\rangle +  |000001\\rangle |0\\rangle  + |000010\\rangle |0\\rangle + |000011\\rangle |0\\rangle + \\ldots \\right) \\\\\n",
    "|PROTOZOAN \\rangle &=& \\frac{1}{8} \\left( |000000\\rangle |0\\rangle +  |000001\\rangle |0\\rangle + |000010\\rangle |1\\rangle + |000011\\rangle |1\\rangle + \\ldots \\right) \\\\\n",
    "|BACTERIAL \\rangle &=& \\frac{1}{8} \\left( |000000\\rangle |0\\rangle +  |000001\\rangle |0\\rangle + |000010\\rangle |0\\rangle + |000011\\rangle |1\\rangle + \\ldots \\right)\n",
    "\\end{eqnarray}\n",
    "\n",
    "The first four codes of Yeast's are all \"-\", and therefore at the above all of the second registers of the corresponding state are \"0\". And so on. \n",
    "\n",
    "### Creating quantum superposition for genetic codes\n",
    "\n",
    "Below is the python function to create a quantum superposition for a given genetic code as above. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:52:53.178637Z",
     "start_time": "2018-09-25T16:52:53.170526Z"
    }
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import numpy as np\n",
    "import math\n",
    "from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister\n",
    "from qiskit import execute"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:52:53.351971Z",
     "start_time": "2018-09-25T16:52:53.339024Z"
    }
   },
   "outputs": [],
   "source": [
    "def encode_bitstring(bitstring, qr, cr, inverse=False):\n",
    "    \"\"\"\n",
    "    create a circuit for constructing the quantum superposition of the bitstring\n",
    "    \"\"\"\n",
    "    n = math.ceil(math.log2(len(bitstring))) + 1                 #number of qubits\n",
    "    assert n > 2, \"the length of bitstring must be at least 2\"\n",
    "    \n",
    "    qc = QuantumCircuit(qr, cr)\n",
    "    \n",
    "    #the probability amplitude of the desired state\n",
    "    desired_vector = np.array([ 0.0 for i in range(2**n) ])     #initialize to zero\n",
    "    amplitude = np.sqrt(1.0/2**(n-1))\n",
    "    \n",
    "    for i, b in enumerate(bitstring):\n",
    "        pos = i * 2\n",
    "        if b == \"1\" or b == \"M\":\n",
    "            pos += 1\n",
    "        desired_vector[pos] = amplitude\n",
    "    if not inverse:\n",
    "        qc.initialize(desired_vector, qr)\n",
    "        qc.barrier(qr)\n",
    "    else:\n",
    "        qc.initialize(desired_vector, qr).inverse()  #invert the circuit\n",
    "        for i in range(n):\n",
    "            qc.measure(qr[i], cr[i])\n",
    "    print()\n",
    "    return qc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We can now create quantum circuits to create the quantum states for the Yeast's, Protozoan's, and Bacterial's."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:52:53.884331Z",
     "start_time": "2018-09-25T16:52:53.649109Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "n = math.ceil(math.log2(len(YEAST))) + 1                 #number of qubits\n",
    "qr = QuantumRegister(n)\n",
    "cr = ClassicalRegister(n)\n",
    "\n",
    "qc_yeast     = encode_bitstring(YEAST, qr, cr)\n",
    "qc_protozoan = encode_bitstring(PROTOZOAN, qr, cr)\n",
    "qc_bacterial = encode_bitstring(BACTERIAL, qr, cr)\n",
    "\n",
    "circs = {\"YEAST\": qc_yeast, \"PROTOZOAN\": qc_protozoan, \"BACTERIAL\": qc_bacterial}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inverting quantum circuit\n",
    "\n",
    "We can easily invert a quantum circuit by `inverse()` function. These inversed circuits are desirable to compute the closeness of the quantum states. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:52:54.207679Z",
     "start_time": "2018-09-25T16:52:53.960262Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "inverse_qc_yeast     = encode_bitstring(YEAST,     qr, cr, inverse=True)\n",
    "inverse_qc_protozoan = encode_bitstring(PROTOZOAN, qr, cr, inverse=True)\n",
    "inverse_qc_bacterial = encode_bitstring(BACTERIAL, qr, cr, inverse=True)\n",
    "\n",
    "inverse_circs = {\"YEAST\": inverse_qc_yeast, \"PROTOZOAN\": inverse_qc_protozoan, \"BACTERIAL\": inverse_qc_bacterial}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Comparing bitsrings\n",
    "\n",
    "We can now compare how close the starts of the genetic codes of Protozoan to Yeast's and Bacterial's by performing the test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-09-25T16:54:49.528857Z",
     "start_time": "2018-09-25T16:52:54.299872Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Similarity score of PROTOZOAN and YEAST is 0.827\n",
      "Similarity score of PROTOZOAN and BACTERIAL is 0.965\n",
      "[ANSWER] PROTOZOAN is most similar to BACTERIAL\n"
     ]
    }
   ],
   "source": [
    "from qiskit import IBMQ, BasicAer\n",
    "\n",
    "key = \"PROTOZOAN\"       #the name of the code used as key to find similar ones\n",
    "\n",
    "# use local simulator\n",
    "backend = BasicAer.get_backend(\"qasm_simulator\")\n",
    "shots = 1000\n",
    "\n",
    "combined_circs = {}\n",
    "count = {}\n",
    "\n",
    "most_similar, most_similar_score = \"\", -1.0\n",
    "\n",
    "for other_key in inverse_circs:\n",
    "    if other_key == key:\n",
    "        continue\n",
    "        \n",
    "    combined_circs[other_key] = circs[key] + inverse_circs[other_key]   #combined circuits to look for similar codes\n",
    "    job = execute(combined_circs[other_key], backend=backend,shots=shots)\n",
    "    st = job.result().get_counts(combined_circs[other_key])\n",
    "    if \"0\"*n in st:\n",
    "        sim_score = st[\"0\"*n]/shots\n",
    "    else:\n",
    "        sim_score = 0.0\n",
    "    \n",
    "    print(\"Similarity score of\",key,\"and\",other_key,\"is\",sim_score)\n",
    "    if most_similar_score < sim_score:\n",
    "        most_similar, most_similar_score = other_key, sim_score\n",
    "\n",
    "print(\"[ANSWER]\", key,\"is most similar to\", most_similar)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We observe that the test can be used to determine which code is closer: bacterial's is closer to protozoan's than yeast's. \n",
    "\n",
    "There are many other genetic codes listed at [bioinformatics.org](http://www.bioinformatics.org/sms2/genetic_code.html) which can be used as input strings. In general, DNA has four nucleotides: \"A\", \"C\", \"G\", and \"T\". Thus, instead of one qubit like in this notebook, two qubits are required to encode the nucleotides. However, the asymptotic amount of quantum bits for encoding the whole sequence of length $N$ is still in the order of $\\log{N}$, which is exponentially small. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Deep Dive\n",
    "\n",
    "The technique of using circuit inversion to measure how close two quantum states has been used in many literature. For example, it is used for Quantum Kernel Estimation in [Havlicek et al., 2018](https://arxiv.org/pdf/1804.11326.pdf) for supervised learning. The idea of using quantum superposition to encode bistrings appeared in [Quantum Fingerprinting](https://arxiv.org/pdf/quant-ph/0102001.pdf) where a quantum exponential advantage is shown for a communication task of comparing two bitstrings. \n",
    "\n",
    "The intuition of why combining a circuit which creates a quantum state with another circuit which is the inverted circuit of creating another quantum state can be used to measure how close two quantum states is as follow. \n",
    "\n",
    "All operations (except measurements) in quantum computers are unitary and hence, distance preserving. This means if we apply the same operation (or, circuit) to two states that are similar, the resulting states will also be similar. All those operations are also reversible, that means, if we know a circuit $C$ to create a particular quantum state $|\\phi\\rangle$ from the all-zero state, we can also design the circuit $C'$ that transforms back the quantum state $|\\phi\\rangle$ to the all-zero state. Now, if we apply $C'$ to a quantum state $|\\psi\\rangle$ which is similar to $|\\phi\\rangle$, we will obtain a quantum state which is also similar to the all-zero state. The distance of the resulting state to the all-zero state is the same as the distance between  $|\\phi\\rangle$ and $|\\psi\\rangle$. \n",
    "\n",
    "\n",
    "We can notice that the similarity of two different quantum states can be very close to zero, and thus making difficult to find the discrepancies. However, we can use encoding techniques, such as, by [employing repetition code](https://arxiv.org/abs/1709.00990), to guarantee that different quantum states are separated far enough. In general, we can exploit error correcting codes, such as, [Justesen code](https://en.wikipedia.org/wiki/Justesen_code), or [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) to encode bitstrings efficiently. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "48px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
